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Abstract 

Social networks have emerged as a critical factor in infor- 
mation dissemination, search, marketing, expertise and influ- 
ence discovery, and potentially an important tool for mobiliz- 
ing people. Social media has made social networks ubiqui- 
tous, and also given researchers access to massive quantities 
of data for empirical analysis. These data sets offer a rich 
source of evidence for studying dynamics of individual and 
group behavior, the structure of networks and global patterns 
of the flow of information on them. However, in most pre- 
vious studies, the structure of the underlying networks was 
not directly visible but had to be inferred from the flow of 
information from one individual to another. As a result, we 
do not yet understand dynamics of information spread on net- 
works or how the structure of the network affects it. We ad- 
dress this gap by analyzing data from two popular social news 
sites. Specifically, we extract social networks of active users 
on Digg and Twitter, and track how interest in news stories 
spreads among them. We show that social networks play a 
crucial role in the spread of information on these sites, and 
that network structure affects dynamics of information flow. 



Introduction 

Social scientists have long recognized the impor- 
tance of social networks in the spread of informa- 
tion (IGranovetter 19731 1 and innovation (Rogers 2003) >. 
Modern communications technologies, notably email and 
more recently social media, have only enhanced the role of 
networks in marketing ([Domingos and Richardson 2001 ; 
Kempe, Kleinberg, and Eva Tardos 2003}, information dis - 
seminati on dWu et al. 20041 IGruhl and Liben-nowell 2004K 
search (Adamic and Adar 2005), and expertise discov- 
ery dDavitz et al. 20071) . The recent DARPA Network 
Challenged successfully tested the ability of online social 
networks to mobilize massive ad-hoc teams to solve real- 
world problems, which could potentially improve disaster 
response and coordination of relief efforts. In addition 
to making social networks ubiquitous, social media sites 
have given researchers access to massive quantities of 
data for empirical analysis. These data sets offer a rich 
source of evidence for studying the structure of social 



networks (Leskovec and Horvitz 2008 ) and the dynamics 
of individual ( Vazquez et al. 2006| l and group behav- 
ior ( |Hogg and Lerman 2009[), efficacy of viral product rec - 
ommendation dLeskovec, A damic, and Huber man 2006[ ), 
global properties of the spread of email mes- 
sages dWu et al. 20041: |Liben-Nowell and Kleinberg 2008[ ) 
and blog posts (Lesk ovec et al. 2007bK and identifica- 
tion of influential blogs ( IGruhl and Liben-nowell 20041 
ILeskovece t al. 2007a). In most of these studies, however, 
the structure of the underlying network was not visible but 
had to be inferred from the flow of information from one 
individual to another. This posed a serious challenge to 
our efforts to understand how the structure of the network 
affects dynamics of information spread on it. 

Understanding this question is especially critical for the 
effective use of social media and peer production sys- 
tems, which often aggregate over activities of, or contri- 
butions made by, many people in order to identify trend- 
ing topics and noteworthy contributions. Most of these 
sites also highlight activities of a person's social network 
links. Since people create links to others who are sim- 
ilar to them, or whose contributions they find interest- 
ing, the dynamics of information on a social network may 
be different from its dynamics within the general popula- 
tion. Separating in-network from out-of-network activity al- 
lows us, among other things, to better estimate the inher- 
ent quality of the contributions (Crane and Sornette 2008) 
or predict their future activity (Ho gg and Lerman 2010| 



Lerman and Galstyan 20081. This will in turn allow us to 



Copyright © 2010, Association for the Advancement of Artificial 
Intelligence (www.aaai.org). All rights reserved. 
'https://networkchallenge.darpa.mil 



separate high quality contributions from noise. 

Social news sites Digg and Twitter offer a unique opportu- 
nity to study dynamics of information spread on social net- 
works. Both sites have become important sources of timely 
information for people. The social news aggregator Digg 
allows users to submit links to news stories and vote on sto- 
ries submitted by other users. On the microblogging service 
Twitter users tweet short text messages that often contain 
links to news stories and comment on or retweet messages 
of others. Both sites enable users to explicitly create links 
to other users they want to follow. Another important com- 
mon feature is data transparency, with both sites providing 
programmatic access to detailed data about story and user 
activity. 

This paper presents an empirical study of the role of so- 



cial networks in the spread of information on Digg and Twit- 
ter. For our study we collected data about popular stories on 
Digg and Twitter that includes information about who voted 
or retweeted the story and when. In addition, we extracted 
the social networks of active users on these sites. These 
data sets allow us to empirically characterize individual dy- 
namics, network structure, and to map the spread of interest 
in news stories through the network. First, we empirically 
characterize the structure of social networks on both sites. 
While the number of fans a user has on each site exhibits 
a long-tail distribution, Digg's social network is denser and 
more interconnected than Twitter's, as judged by the number 
of reciprocated links and the network clustering coefficient. 
We also show that user activity on both sites has a power-law 
distribution, albeit with different exponents. Next, we study 
evolution of the number of votes stories receive. We show 
that user interface affects dynamics of votes, with evolution 
of Digg stories going through two distinct stages. Never- 
theless, the number of votes accumulated by stories on both 
sites saturates after a period of about a day to a value that 
reflects their popularity. Next, we study how information 
spreads through the social network by measuring how the 
number of in-network votes a story receives, i.e., votes from 
fans of the submitter or previous voters, changes in time. We 
show that the structure of the network affects dynamics of 
information spread, with information reaching nodes faster 
in a denser network of Digg than Twitter. However, Twitter 
stories spread farther, as judged by the total number of in- 
network votes they receive. We conclude with a discussion 
of implications of the study. 

Social News 

Social media has become an important channel for people to 
share information. On Digg, Twitter, Slashdot, Reddit, and 
Facebook, among others, users post news or links to news 
stories, discuss them, and share their opinions in real time. 
Often, these sites are the first to break important news. After 
the Christmas 2009 failed attempt to blow up a US commer- 
cial airliner, Twitter was the first source to report new secu- 
rity measures for international flights (ICarr 20101 1. In addi- 
tion to news, these sites are being used as a tool to organize 
people. For example, in the aftermath of the disputed elec- 
tions in Iran in June 2009, the opposition movement used 
Twitter to mobilize the public, organize protests, and inform 
people about the latest developments, which was more vital 
in the absence of reliable official information sources. 

Digg (http://digg.com) is a popular social news aggrega- 
tor with over 3 million registered users. Digg allows users 
to submit links to and rate news stories by voting on, or dig- 
ging, them. There are many new submissions every minute, 
over 16,000 a day. Digg picks about a hundred stories daily 
to feature on its front page. Although the exact promotion 
mechanism is kept secret, it appears to take into account the 
number and the rate at which story receives votes. Digg's 
success is largely fueled by the emergent front page, created 
by the collective decisions of its many users. 

A newly submitted story goes to the upcoming stories list, 
where it remains for 24 hours, or until it is promoted to the 
front page, whichever comes first. Newly submitted stories 



are displayed as a chronologically ordered list, with the most 
recent story at the top of the list, 15 stories to a page. Pro- 
moted (or 'popular') stories are also displayed in a reverse 
chronological order on the front pages, 15 stories to a page, 
with the most recently promoted story at the top of the list. 
The importance of beingpromoted has, among other things, 
spawned a black marked which claims the ability to manip- 
ulate the voting process. 

Digg also allows users to designate friends and track their 
activities. The friends interface allows users to see the sto- 
ries friends recently submitted or voted for. The friendship 
relationship is asymmetric. When user A lists user B as a 
friend, A can watch the activities of B but not vice versa. We 
call A the fan of B. A newly submitted story is visible in the 
upcoming stories list, as well as to submitter's fans through 
the friends interface. With each vote it also becomes visi- 
ble to voter's fans. The friends interface can be accessed by 
clicking on Friends Activity tab at the top of any Digg page. 
In addition, a story submitted or voted on by user's friends 
receives a green ribbon on the story's Digg badge, raising its 
visibility to fans. 

We used Digg API to collect data about 3,553 stories pro- 
moted to the front page in June 2009. The data associated 
with each story contained story title, story id, link, submit- 
ter's name, submission time, list of voters and the time of 
each vote, the time the story was promoted to the front page. 
In addition, we collected the list of voters' friends. From this 
information, we were able to reconstruct the fan network of 
Digg users who were active during the sample period. 

Twitter (http://twitter.com) is a popular social network- 
ing site that allows registered users to post and read short 
(at most 140 characters) text messages, which may con- 
tain URLs to online content, usually shortened by a URL 
shortening service such as bit.ly or tinyurl. A user can also 
retweet or comment on another user's post, usually prepend- 
ing it with a string "RT @x" where £ is a user's name. Post- 
ing a link on Twitter is analogous to submitting a new story 
on Digg, and retweeting the post is analogous to voting for it. 
Like Digg, Twitter allows users to designate as friends other 
users whose posts they want to follow. Being a follower on 
Twitter is equivalent to being a fan on Digg. 

Twitter restricts large-scale access to its data to a lim- 
ited number of entities. One of these, Tweetmeme 
(http://tweetmeme.com), aggregates all Twitter posts to de- 
termine frequently retweeted URLs, categorizes the stories 
these URLs point to, and presents them as news stories in 
a fashion similar to Digg's front page. We collected data 
from Tweetmeme using specialized page scrapers developed 
using Fetch Technologies's AgentBuilder tool. For each 
story, we retrieved the name of the user who posted the link 
to it, the time it was posted, the number of times the link 
was retweeted, and details of up to 1000 of the most recent 
retweets. For each retweet, we extracted the name of the 
user, the text and time stamp of the retweet. We were limited 
to 1 000 most recent retweets by the structure of Tweetmeme. 
We extracted 398 stories from Tweetmeme that were origi- 
nally posted between June 11, 2009 and July 3, 2009. Of 

2 As an example, see http://subvertandprofit.com 



these, 329 stories had fewer than 1000 retweets. Next, we 
used Twitter API to download profile information for each 
user in the data set. The profile included the complete list of 
user's friends and followers. 

Characteristics of User Activity 

We define as active user any user who voted for at least 
one story on Digg or retweeted at least one story on 
Twitter. There are 139,409 active Digg and 137,582 ac- 
tive Twitter users in our sample. On Digg, 71,834 ac- 
tive users designated at least one other user as a friend, 
with a total of 258,220 friend links. Active users on 
Twitter were connected to 6,200,051 users. From this 
data, we were able to reconstruct the fan networks of ac- 
tive users, i.e., active users who are watching activities 
of other users. Figure Q] shows the distribution of num- 
ber of active fans and followers per user. Digg's distri- 
bution, shown in Fig. QI a X nas a long-tail shape that is 
common to degree distributions in real-world complex net- 
works (Clauset, Shalizi, and Newman 2009). Twitter's dis- 
tribution, shown in Fig. |TJb), has a peak at around 100 fol- 
lowers and a long tail. 

As the numbers above suggest, the Digg social network 
is denser, more tightly knit than the Twitter social network. 
We measure density by the number of reciprocal friendship 
links and the modified clustering coefficient. A reciprocal, 
or mutual, friendship link exists when user A marks B as 
friend and vice versa. There were 125,219 such links among 
279,725 distinct users in the Digg sample and 3,973,892 
mutual links among 6,200,051 users in the Twitter sam- 
ple. Normalizing these counts by the number of all pos- 
sible mutual links in the network gives us the fraction of 
mutual links f m . For Digg f m = 3.20 x 10~ 6 , and for 
Twitter f m = 2.07 x 10 -7 , an order of magnitude smaller. 
The clustering coefficient f c measures the degree to which 
a node's network neighbors are interlinked. We define the 
clustering coefficient for directed networks such as those 
that exist on Digg and Twitter as the fraction of closed tri- 
angles that exist out of all possible sets of three nodes, or 
triples. For simplicity, we define a closed triangle as a cycle 
of length three that exists when A lists B as a friend, B lists 
C and C lists A as a friend. There were 166,239 such trian- 
gles in the Digg network, giving us the clustering coefficient 
f c = 7.60 x 10" 12 , and 4,566,952 triangles on Twitter, giv- 
ing the clustering coefficient of f c = 1.92 x 10 -14 that is 
two orders of magnitude smaller. Due to the size of the net- 
works, we implemented these metrics using Hadoorfl We 
suspect that the differences in density of the two networks 
are due to their age, since Twitter is a more recent service 
than Digg. With time, we expect the Twitter network to 
grow denser (|Leskovec, Kleinberg, and Faloutsos 2005 1 and 



become as tightly knit as Digg. 

Next, we characterize users' voting activity. The 139,409 
active users in the Digg data set cast 3,018,197 votes on 
3,553 stories. User activity is not uniform, as shown in inset 
Fig. Q2 a). While majority of users cast fewer than 10 votes, 
some users voted on thousands of stories over the sample 



time period. The distribution of the number of retweets per 
user in the Twitter data set has a similar shape, with the num- 
ber of retweets per user ranging from 1 to about 100. The 
difference in slopes in these distribution is likely explained 
by the level of effort (Wilkinson 2008) required to vote on 
Digg vs retweet on Twitter. 

Dynamics of Voting 

Our data sets contain a complete record of voting on Digg 
front page stories and frequently retweeted stories on Twit- 
ter. From this data we can reconstruct dynamics of voting. 
In addition to voting history, we also know the active fan 
network of Digg and Twitter users and use this information 
to check whether a particular voter is a fan of the submitter 
or previous voters. We call these in-network votes/an votes. 
This information allows us to study how interest in the story 
spreads through the social networks on Digg and Twitter. 

Figure [2 a) shows the evolution of the number of votes 
received by three Digg stories about post-election unrest in 
Iran in June 2009. While the details of the dynamics differ, 
the general features of votes evolution are shared by all Digg 
stories and can be described by a stochastic model of social 
voting (Hogg and Lerman 2009). While in the upcoming 
stories queue, a story accumulates votes at some slow rate. 
The point where the slope abruptly changes corresponds to 
promotion to the front page. After promotion the story is 
visible to a large number of people, and the number of votes 
grows at a faster rate. As the story ages, accumulation of 
new votes slows down flWu and Huberman 20 07 ) and finally 
saturates. Figure [2b ) shows the evolution of the number of 
times stories on the same topics were retweeted. The num- 
ber of retweets grows smoothly until it saturates. It takes 
about a day for the number of votes/retweets to saturate on 
both sites. 



Distribution of popularity The total number of times the 
story was voted for and retweeted reflects their popularity 
among Digg and Twitter users respectively. The distribution 
of story popularity on either site, Figure [3] shows the 'in- 
equality of popularity' (Salganik, Dodds, and Watts 2006), 
with relatively few stories becoming very popular, accruing 
thousands of votes, while most are much less popular, re- 
ceiving fewer than 500 votesQ The most common number 
of votes by a story is around 500 on Digg and 400 on Twitter. 
These values are well described by a lognormal distribution 
(shown as the red line in the figure). 

The log-normal distribution of story popularity is typical 
of the "heavy-tailed" distributions associated with social 
production and consumption of content. In a heavy-tailed 
distribution a small but non-vanishing number of items 
generate uncharacteristically large amount of activity. These 
distributions have been observed in a variety of contexts, 
including voting on Digg (Wu and Huberman 2007 ) 
and Essembly (Hogg and Szabo 2009), edits of 
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4 This distribution applies to Digg's front page stories only. Sto- 
ries that are never promoted to the front page receive very few 
votes, in many cases just a single vote from the submitter. 
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(a) Digg (b) Twitter 

Figure 1 : Distribution of user activity, (a) Number of active fans per user in the Digg data set vs the number of users with that 
many fans. Inset shows distribution of voting activity, i.e., number of votes per user vs number of users who cast that many 
votes, (b) Number of active followers per user in the Twitter data set vs the number of users with that many followers. Inset 
shows distribution of retweeting activity. 



Wikipedia articles (Wilkinson 2008), and music down- 
loads (Salganik, Dodds, and Watts 2006). Understanding 
the origin of such distributions is the next challenge in 
modeling user activity on social media sites. 

Dynamics of Voting on Networks 

At the time of submission, a Digg story is visible on the 
upcoming stories list and to submitter's fans through the 
friends interface. As users vote on the story, it becomes vis- 
ible to their own fans via the friends interface. Analogous 
to the spread of a contagious disease dNewman 20 02 ), inter- 
est in the story cascades through the social network. When 
the story is promoted to the front page, it becomes visible 
to many nonfans, although users are still able to pick out 
stories their friends liked through the green ribbon on the 
story's Digg badge. Similarly, a new post on Twitter is visi- 
ble to submitter's followers, and every user who retweets the 
story broadcasts it to her own followers. Although aggrega- 
tors like Tweetmeme attempt to identify popular stories on 
Twitter in Digg-like fashion, there is no evidence that they 
boost their visibility to nonfans. 

We can trace the cascade of interest in a story through 
the underlying social network of Digg (Twitter) by 
checking whether a new vote (re tweet) came from a 
fan (follower) of any of the previous voters, including 
the submitter. We call such votes or retweets fan votes, 
regardless of whether we are talking about Digg or Twit- 
ter. Therefore, the cascade ("information contagion" 
in the title of this article) starts with story's submitter 
and grows as the story accrues fan votes. Researchers 
have studied information cascades in email chain let- 
ters (IWu et al. 20041 |Liben-Nowell and Kleinberg 2008 
and blog post s (IGruhl and Liben-nowell 2004 
ILeskovece t al. 2007b) in order to obtain insights 
into the structure of the network, identify influen- 
tial nodes within it, or predict popularity of con- 
tent ( Lerman and Galstyan 2008| . Characterizing in- 



formation cascades is necessary for creating a model of the 
dynamics of information on networks. 

Dynamics and distribution of fan votes The dashed lines 
in Figure [2] show how the number of fan votes received by 
each story, grows in time. Their evolution is similar to that 
of all votes and growth saturates after a period of about a 
day. The value at which growth saturates shows the story's 
range, or how widely it penetrates the social network. Fig- 
ure [4] shows the distribution of cascade sizes generated by 
Digg and Twitter stories. These distributions are markedly 
different from the distribution of story popularity shown in 
Fig. [3] Although the distribution of network cascades of 
Digg stories, Fig. EJa), is slightly asymmetrical, it is best 
described by a normal with the mean and standard deviation 
equal to 104.27 and 32.31 votes respectively, not the log- 
normal distribution in Fig.|5Ja). It is also unlike distribution 
of cascade sizes in a blog post network, which has a power 
law distribution (Leskov ec et al. 20 07b ). Remarkably, there 
are no stories that did not generate a cascade, i.e., which did 
not receive any fan votes. 

The inset in Figure |4ja) shows the distribution of votes 
from submitter's fans only. It is also described by a normal 
function with a mean around 50 votes. A small fraction of 
stories, fewer than 400, did not have any votes from sub- 
mitter's fans. This indicates that active users who are fans of 
the submitter are also fans of other voters, i.e., that the social 
network of active Digg users is dense and highly interlinked. 
This observation is supported by the finding of a relatively 
high clustering coefficient of the Digg social network. 

The distribution of cascade sizes of of Twitter stories is 
shown in Fig. Sfb). These also appear to be normally dis- 
tributed, although a substantial number of stories do not 
spread on the network. This distribution is broader than that 
of Digg stories, which indicates that stories spread farther on 
the Twitter network. The distribution of the number of votes 
cast by submitter's followers, shown in inset in Fig. Hfb), 
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Figure 2: Dynamics of stories on Digg and Twitter, (a) Total number of votes (diggs) and fan votes received by stories on Digg 
since submission, (b) Total number of times a story was retweeted and the number of retweets from followers since the first 
post vs time. The titles of stories on Digg were: storyl: "U.S. Government Asks Twitter to Stay Up for #IranElection", story2: 
"Western Corporations Helped Censor Iranian Internet", story3: "Iranian clerics defy ayatollah, join protests." The titles of 
retweeted stories were: story 1:"US gov asks twitter to stay up", story2:"Iran Has Built a Censorship Monster with help of west 
tech", story3:"Clerics join Iran's anti-government protests - CNN.com." 



is markedly different from Digg. The vast majority of the 
stories did not receive any votes from submitter's followers, 
indicating that submitter's and other voters' followers are 
disjoint. This observation is supported by our finding that 
the Twitter social network is sparsely interconnected. 

Evolution of fan votes Figure[5]shows how the number of 
fan votes (size of the cascade), aggregated over all stories, 
grows during the early stages of voting or retweeting. While 
there is significant variation in the number of fan votes re- 
ceived by a story, the aggregate exhibits a well-defined trend. 
The solid black lines show the median cascade size, while 
thin gray lines show the envelope of the boundary that is 
one standard deviation from the mean. 

The cascade grows steadily with new votes on Digg 
(Fig- Ha)), although faster initially, indicating that there are 
two distinct mechanisms for story visibility on Digg. This 
is seen more clearly in Fig.|5jb), which shows the probabil- 
ity that next vote is a fan vote and will increase the size of 
the cascade. We separate votes cast before promotion from 
those cast after the story is promoted. Before promotion, this 
probability is almost constant, atp = 0.74. After promotion, 
it decays to a lower, but also almost constant value p — 0.3. 
This is consistent with our hypothesis that before promotion 
social networks are the primary mechanism for spreading 
interest in new stories. Although a story is also visible on 
the upcoming stories list, few users actually discover stories 
there. With 16,000 daily submissions, a new story is quickly 
submerged by new submissions and is pushed to page 15 of 
the upcoming stories list within the first 20 minutes. Few 
users are likely to navigate that far ( Huber man et al. 1 998 ). 
Promotion to the front page, which generally happens when 
a story accrues between 50 and 100 votes, exposes the story 
to a large and diverse audience, making social networks less 
of a factor in its spread, since large numbers of Digg users 



who read front page stories do not befriend others. 

The spread of interest in stories through the Twitter net- 
work, shown in Figure[2c), is similar to Digg. As on Digg, 
the median number of fan votes rises steadily during the 
early stages of voting. However, the rate of growth is nearly 
constant, indicating there is a single significant mechanism 
for making stories visible to voters, namely the social net- 
work. The probability that next retweet is from a fan, shown 
in Fig. |5jd), rises slowly from around p = 0.4 to p = 0.55. 
This value is lower than pre-promotion probability of next 
fan vote on Digg. The rate of interest spread appears to de- 
pend on the density of network. Initially, Digg stories spread 
faster through the social network than stories on Twitter, be- 
cause of Digg's denser network structure, but after promo- 
tion they spread much slower as unconnected users see and 
vote on the stories. 

The dashed lines in Fig. [2 a) & (c) show how the median 
number of votes from submitter's fans or followers changes 
with voting. By the time a story accumulates 50 votes on 
Digg (at which point some of the stories are promoted to 
the front page), about half of the votes are from submitter's 
fans, and another 10 are from fans of prior voters but not 
the submitter. After a story receives about 100 votes (by 
which point most of the stories are promoted), the number of 
votes from submitter's fans changes very slowly, while the 
number of fan votes continues to grow. This indicates that 
submitter's fans vote for the story during its early stages and 
that users pay attention to the stories their friends submit. 
On Twitter, initial votes are from submitter's fans, but slows 
significantly later. 

Related Work 

Several researchers studied dynamics of information flow 
on networks, however, empirical studies have produced con- 
flicting results. dWu et al. 20041 ) examined patterns of email 
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Figure 3: Distribution of story popularity, (a) Distribution of the total number of votes received by Digg stories, with line 
showing log-normal fit. The plot excludes the 15 stories that received more than 6,000 votes, (b) Distribution of the total 
number of times stories in the Twitter data set were retweeted, with the line showing log-normal fit. 
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Figure 4: Distribution of story cascade sizes, (a) Histogram of the distribution of the total number of fan votes received by 
Digg stories (size of the interest cascade). The inset shows the distribution of the number of votes from submitter's fans, (b) 
Histogram of the distribution of the total number of retweets from followers. The inset shows the distribution of the number of 
retweets of a story from submitter's followers. 



forwarding within an organization and found that email for- 
warding chains terminate after an unexpectedly small num- 
ber of steps. They argued that unlike the spread of a virus 
on a social network, which is expected to reach many indi- 
viduals, the flow of information is slowed by decay of sim- 
ilarity among individuals within the social network. They 
measured similarity by distance in organizational hierar- 
chy between the two individuals within an organization, 
or in general, as a number of edges separating two nodes 
within a graph. Similarly, in a large-scale study of the 
effectiveness of word-of-mouth product recommendations, 
( |Leskovec, Adamic, and H uberma n2006[ ) found that most 
recommendation chains terminate after one or two steps. 
However, authors noted sensitivity of recommendation to 
price and category of product, leaving open the question 
whether social networks are an effective tool for dissemi- 
nating information, rather than purchasing products. Con- 
trary to these studies, we find that information, such as 
news, reaches many individuals within a social network. 



Moreover, the reach of information spread does not seem 
to depend on similarity between users, at least when simi- 
larity is measured by number of edges between them. On 
Digg, whose users are highly interconnected, a story does 
not reach as many fans as on Twitter, where users are less 
densely connected. 



Like Wu et al., ( |Liben-Nowell and Kleinberg 20 08) stud- 
ied the patterns of forwarding of two popular email petitions. 
Unlike their expectations, the forwarding chains produced 
long narrow, rather than bushy wide, trees. In these stud- 
ies, however, the structure of the underlying social network 
was not directly visible but had to be inferred by observing 
new signatures on the forwarded petitions. This method of- 
fers only a partial view of the network and does not identify 
all edges between individuals that participated in the email 
chain. If an individual has already forwarded the message, 
she will not do so again, and an edge between her and the 
sender will not be observed. In our study, on the other hand, 
the networks are extracted independently of data about the 
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Figure 5: Spread of interest in stories through the network, (a) Median number of fan votes vs votes, aggregated over all Digg 
stories in our data set. Dotted lines show the boundary one standard deviation from the mean. Dashed lines shows the number 
of votes from fans of submitter, (b) Probability next vote is from a fan before and after the Digg story is promoted, (c) Median 
number of retweets from followers vs all retweets, aggregated over all stories in the Twitter data set. (d) Probability next retweet 
is from a follower. 



spread of information. 

A number of researchers have studied the flow of informa- 
tion and influence in the blogosphere and in a virtual world. 
( Gruhl and Liben-nowell 2004 ) traced topic propagation 
through blogs and used a model of the spread of epidemics 
on networks ( Newman 2002 ) to characterize the spread of 
topics through the blogosphere. (Lesk ovec et al. 2007bl > de- 
fined an information cascade as a graph of hyperlinks be- 
tween blog posts. A cascade starts with a cascade initiator, 
with other blog posts joining the cascade by linking to the 
initiator or other members of the cascade. Leskovec et al. 
found that the distribution of cascade sizes follows a power 
law. In these studies, the networks were derived from the 
observed links between blog posts, i.e., from the diffusion 
of information. In our study, on the contrary, they were ex- 
tracted from the sites independently of data about the dif- 
fusion of information. (Bakshy, Karrer, and Adamic 2009) 
traced the spread of influence in a multi-player online game 
and found that similar to our findings with social news, in- 
fluence spreads easily on social networks in virtual worlds. 
This provides an independent confirmation of the impor- 
tance of social networks in the dynamics of information 
flow. 



Conclusion 

We conducted an empirical analysis of user activity on Digg 
and Twitter. Though the two sites are vastly different in their 
functionality and user interface, they are used in strikingly 
similar ways to spread information. First, on both sites users 
actively create social networks by designating as friends oth- 
ers whose activities they want to follow. Second, users em- 
ploy these networks to discover and spread information, in- 
cluding news stories. The mechanism for the spread of in- 
formation is the same on both sites, namely, users watch 
their friends' activities — what they tweet or vote for — 
and by their own tweeting and voting actions they make this 
information visible to their own fans or followers. In spite 
of the similarities, there are quantitative differences in the 
structure and function of social networks on Digg and Twit- 
ter. Digg networks are dense and highly interconnected. A 
story posted on Digg initially spreads quickly through the 
network, with users who are following the submitter also 
likely to follow other voters. After the story is promoted to 
Digg's front page, however, it is exposed to a large number 
of unconnected users. The spread of the story on the net- 
work slows significantly, though the story may still generate 
a large response from Digg audience. The Twitter social net- 
work is less dense than Digg's, and stories spread through 



the network slower than Digg stories do initially, but they 
continue spreading at this rate as the story ages and gener- 
ally penetrate the network farther than Digg stories. 

Understanding characteristics of user activity and the 
effect social networks have on it will help us make 
better use of social media and peer production sys- 
tems. Currently these systems blindly aggregate activi- 
ties of all users in order to identify high quality contri- 
butions. However, since popularity and quality are rarely 
linked (Salganik, Dodds, and Watts 2006 1, this method is 
likely to highlight popular, though trivial, contributions. 
Separating in-network and out-of-network user activity, 
however, will lead to a better understanding of social dynam- 



ics of peer production systems (Hogg and Lerman 2009 



Hogg and Szabo 2009 ; Lerman and Hogg 2010} , which wil 
allow us to better separate high quality contributions from 
noise (Hogg and Lerman 2010 ICrane and Sornette 20081 
|Lerman and Galstyan 2008} . 
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Abstract 

Social networks have emerged as a critical factor in infor- 
mation dissemination, search, marketing, expertise and influ- 
ence discovery, and potentially an important tool for mobiliz- 
ing people. Social media has made social networks ubiqui- 
tous, and also given researchers access to massive quantities 
of data for empirical analysis. These data sets offer a rich 
source of evidence for studying dynamics of individual and 
group behavior, the structure of networks and global patterns 
of the flow of information on them. However, in most pre- 
vious studies, the structure of the underlying networks was 
not directly visible but had to be inferred from the flow of 
information from one individual to another. As a result, we 
do not yet understand dynamics of information spread on net- 
works or how the structure of the network affects it. We ad- 
dress this gap by analyzing data from two popular social news 
sites. Specifically, we extract social networks of active users 
on Digg and Twitter, and track how interest in news stories 
spreads among them. We show that social networks play a 
crucial role in the spread of information on these sites, and 
that network structure affects dynamics of information flow. 

Introduction 

Social scientists have long recognized the importance of so- 
cial networks in the spread of information (?) and inno- 
vation (?). Modern communications technologies, notably 
email and more recently social media, have only enhanced 
the role of networks in marketing (?; ?), information dissem- 
ination (?; ?), search (?), and expertise discovery (?). The 
recent DARPA Network Challenge 1 successfully tested the 
ability of online social networks to mobilize massive ad-hoc 
teams to solve real-world problems, which could potentially 
improve disaster response and coordination of relief efforts. 
In addition to making social networks ubiquitous, social me- 
dia sites have given researchers access to massive quantities 
of data for empirical analysis. These data sets offer a rich 
source of evidence for studying the structure of social net- 
works (?) and the dynamics of individual (?) and group 
behavior (?), efficacy of viral product recommendation (?), 
global properties of the spread of email messages (?; ?) 
and blog posts (?), and identification of influential blogs (?; 

Copyright © 2010, Association for the Advancement of Artificial 
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?). In most of these studies, however, the structure of the un- 
derlying network was not visible but had to be inferred from 
the flow of information from one individual to another. This 
posed a serious challenge to our efforts to understand how 
the structure of the network affects dynamics of information 
spread on it. 

Understanding this question is especially critical for the 
effective use of social media and peer production sys- 
tems, which often aggregate over activities of, or contri- 
butions made by, many people in order to identify trend- 
ing topics and noteworthy contributions. Most of these 
sites also highlight activities of a person's social network 
links. Since people create links to others who are similar to 
them, or whose contributions they find interesting, the dy- 
namics of information on a social network may be differ- 
ent from its dynamics within the general population. Sep- 
arating in-network from out-of-network activity allows us, 
among other things, to better estimate the inherent quality 
of the contributions (?) or predict their future activity (?; 
?). This will in turn allow us to separate high quality contri- 
butions from noise. 

Social news sites Digg and Twitter offer a unique opportu- 
nity to study dynamics of information spread on social net- 
works. Both sites have become important sources of timely 
information for people. The social news aggregator Digg 
allows users to submit links to news stories and vote on sto- 
ries submitted by other users. On the microblogging service 
Twitter users tweet short text messages that often contain 
links to news stories and comment on or retweet messages 
of others. Both sites enable users to explicitly create links 
to other users they want to follow. Another important com- 
mon feature is data transparency, with both sites providing 
programmatic access to detailed data about story and user 
activity. 

This paper presents an empirical study of the role of so- 
cial networks in the spread of information on Digg and Twit- 
ter. For our study we collected data about popular stories on 
Digg and Twitter that includes information about who voted 
or retweeted the story and when. In addition, we extracted 
the social networks of active users on these sites. These 
data sets allow us to empirically characterize individual dy- 
namics, network structure, and to map the spread of interest 
in news stories through the network. First, we empirically 
characterize the structure of social networks on both sites. 



